Some near-term arm64 hardening patches

By Jonathan Corbet
November 18, 2019

The arm64 architecture is found at the core of many, if not most, mobile devices; that means that arm64 devices are destined to be the target of attackers worldwide. That has led to a high level of interest in technologies that can harden these systems. There are currently several such technologies, based in both hardware and software, that are being readied for the arm64 kernel; read on for a survey on what is coming.

E0PD

The Meltdown vulnerability enables an attacker in user space to read kernel-space data by making use of a combination of speculative execution and cache-based side channels. The kernel's defense against Meltdown is kernel page-table isolation — removing the kernel's page tables from the user-space mapping entirely. That works, but it has a significant performance cost and it can interfere with the use of other processor features. Nonetheless, it is fairly widely accepted that address-space isolation will be increasingly necessary to protect systems for some time.

There is an alternative, though: fix the hardware instead. One initiative in this area appears to be the E0PD feature, which was added as part of the Arm v8.5 extensions. Documentation on E0PD is scarce to the point of nonexistence; not even the patch set supporting it from Mark Brown describes how it works or what the acronym stands for. That said, the most informative bit of text about E0PD can be found there:

E0PD, introduced in the ARMv8.5 extensions, [...] ensures that accesses from userspace to the kernel's half of the memory map to always fault with constant time, preventing timing attacks without requiring constant unmapping and remapping or preventing legitimate accesses.

E0PD, thus, doesn't prevent speculative execution from going off into memory that user space should not be able to access, but it does block the side channel normally used to extract the data exposed by incorrectly speculated operations. Systems that support E0PD do not need to enable kernel page-table isolation and should, thus, regain the performance that it took away; no benchmark results were included with the patch set, though. E0PD support for the kernel is apparently close to ready, but the availability of processors with E0PD support may take rather longer.

Return-address signing

Arm pointer authentication is a mechanism for applying cryptographic signatures to pointers used in running code. A special instruction creates a signature for a given pointer value using a secret key; the signature is stored in the unused bits at the upper end of the pointer itself. A separate instruction verifies that a given pointer was indeed signed using a specific key. This mechanism can be used to prevent attackers from fooling the kernel into using an ill-advised pointer value.

The return-address signing patch set from Amit Daniel Kachhap uses this feature for a specific purpose: protecting the return addresses for function calls on the stack. In particular, it uses the ‑msign‑return‑address flag added to GCC 7 to build the kernel with this protection. On entry to a function, the return address is signed; when the time comes for the function to return, the signature is verified. Should the verification fail, a kernel oops will be generated and the running process will be killed.

The intent behind this work, of course, is to protect the kernel against buffer overflows or other attacks that overwrite the stack. An attacker may be able to corrupt the stack, but they should not be able to place return addresses there that will pass the verification step. That should protect the kernel against a wide range of potential attacks, since many common techniques depend on placing crafted return addresses on the stack.

Shadow call stacks

Another approach to protecting return addresses can be seen in the shadow call stack support patch set from Sami Tolvanen. Rather than signing return addresses, this patch set uses the Clang ‑fsanitize=shadow‑call‑stack option to cause return addresses to be placed on a separate "shadow" stack located somewhere in memory. Before a function returns, it restores the return address from the shadow stack.

The current call stack tends to be some of the easiest memory for an attacker to corrupt; any buffer overflow of an automatic variable will do. With the shadow call stack, though, this sort of corruption is rendered less harmful, since return addresses no longer live on the stack. The shadow stack will typically be much harder for an attacker to modify, or to even know where it might be located. The result should, once again, be a system that is more secure against buffer-overflow attacks.

Return-address signing and shadow call stacks appear to be two different approaches to the same problem; one probably does not want to use both of them. Tolvanen addresses the question of which should be used in the cover letter:

[The shadow call stack] has a minimal performance overhead, but allocating shadow stacks increases kernel memory usage. The feature is therefore mostly useful on hardware that lacks support for PAC instructions.

In other words, processors that can do pointer authentication should use that feature; shadow call stacks are there for those without that support. This patch set seems to be about ready; it is currently earmarked for the 5.6 merge window.

Branch target identification

The last of the arm64 features under consideration is branch-target identification (BTI), which is intended to trap wild jumps. The idea is simple enough: if BTI is enabled, the first instruction encountered after an indirect jump must be a special BTI instruction. That instruction is a no-op on systems without BTI; with BTI, it has the added benefit of not throwing a fault should it be jumped to. Jumps to locations that do not feature a BTI instruction, instead, will lead to the quick death of the process involved.

BTI, thus, is a way of marking code that is meant to be the target of an indirect jump, thwarting attacks that somehow convince the kernel to jump to some random spot. That should block a range of attacks based on, for example, overwriting a structure full of function pointers called by the kernel. It is interesting to note that BTI does not check the target of a return from a function; the intent is that return-address signing should be used to protect returns. The GCC 9 release includes support for BTI.

Each of these technologies addresses one piece of the problem of protecting arm64 systems from attackers. Put together, they should have the effect of making these systems into significantly harder targets. The arms race will not end, and attackers will certainly find ways of getting around these techniques, at least some of the time. But, with luck, they will find themselves being frustrated more often in the future.

Index entries for this article
Kernel	Security/Kernel hardening

Some near-term arm64 hardening patches

Posted Nov 18, 2019 20:15 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (9 responses)

A message from 2019 to 1988: Real world security issues usually involve getting access to some entities customer database by breaking into web site. The mad effort to blast the body of already completely dead Morris worm into subatomic particles can thus be halted as "no value for money".

Some near-term arm64 hardening patches

Posted Nov 18, 2019 20:53 UTC (Mon) by roc (subscriber, #30627) [Link] (2 responses)

Exploiting memory safety bugs is a huge business. Other kinds of security bugs may or may not be more numerous, but that has no impact on the cost/benefit analysis of these mitigations.

It is true that stack buffer overflows are quite uncommon these days, so you could argue that shadow stacks aren't worth the complexity.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 21:22 UTC (Mon) by luto (guest, #39314) [Link] (1 responses)

A sufficiently strong shadow stack implementation also makes ROP nearly impossible, and doing *something* to protect return addresses is an important part of CFI.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 21:49 UTC (Mon) by roc (subscriber, #30627) [Link]

That's true. I had assumed return-address-signing was going to take care of ROP.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 21:02 UTC (Mon) by dvdeug (guest, #10998) [Link]

In 2019 BC, robbers would kick down doors to enter houses or rob people at knife point. In 2019 AD, they still do. In 2014, the security on the Nintendo 3DS was broken because the Cube Ninja game didn't bounds check data downloaded from the Net. In 2019, Exim had a remotely exploitable buffer overflow (CVE-2019-16928). There may be fancy new hacks, but people will use whatever works, be it old or new.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 21:35 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (4 responses)

Phew, I'll let our security monitoring teams know that none of the attacks they see actually exist.

Some near-term arm64 hardening patches

Posted Nov 20, 2019 9:03 UTC (Wed) by nhippi (subscriber, #34640) [Link] (3 responses)

Imagine if most of the energy spent in low-level hardening would be put in improving usability of password managers 2fa and other security UI/UX issues? Of course that isn't happening, because hardening helps cloud-provider megacorps while the latter would help average people avoid get their passwords guessed/phished all around the world.

Some near-term arm64 hardening patches

Posted Nov 20, 2019 11:41 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

A lot of energy is going into password managers (e.g. Firefox Lockwise), security keys (Yubikey etc), FIDO/WebAuthn, etc etc, and efforts to promote the deployment of that tech, to bypass the password problem. None of that is worth much if the users' software is easily hijacked by malicious input.

We have to solve the UX issues *and* we have to solve the fragile software issues. The resources to address the latter (e.g. kernel developers) aren't easily repurposed to tackle the former, and even if they were, where's your *proof* that that would be the right thing to do? Postulating a conspiracy theory isn't proof.

Some near-term arm64 hardening patches

Posted Nov 20, 2019 11:47 UTC (Wed) by roc (subscriber, #30627) [Link]

To be clear: hardening Linux, Android and client software in general against exploitation definitely helps block attacks against the cheap phones of "average people" all around the world, and it also helps block attacks against the services those people depend on (whether they know it or not).

And FWIW, the "cloud provider megacorps" have a keen interest in improving security UI/UX just as much as preventing software exploitation. AWS lets me use my phishing-proof Yubikey for authentication, but my bank doesn't yet :-(.

Some near-term arm64 hardening patches

Posted Nov 20, 2019 17:15 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

hardening helps cloud-provider megacorps while the latter would help average people avoid get their passwords guessed/phished all around the world.

This is a nonsensical distinction. One of the things those "cloud-provider megacorps" are doing is handling data that affects millions of "average people". That makes them an incredibly tempting target for hackers, because it means they can steal information wholesale instead of retail. I very much want those big companies to put serious effort into protecting their data.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 20:35 UTC (Mon) by MarkRutland (subscriber, #74197) [Link] (2 responses)

The latest ARMv8-A manual describes E0PD in the section titled "Preventing EL0 access to halves of the address map", which summarises the feature:

If ARMv8.5-E0PD is implemented and enabled, the TCR_ELx.{E0PD0, E0PD1} fields can prevent unprivileged access to the addresses translated by TTBR0_ELx or TTBR1_ELx. If access is prevented, the fault is reported as a level 0 fault, and should take the same time to generate, whether the address is present in the TLB or not, to mitigate attacks that use fault timing.

Setting TCR_ELx.E0PD0 should prevent userspace (EL0) accesses to the kernel half of the address space (which is mapped via TTBR1_ELx), speculative or otherwise. The constant-time faulting behaviour should prevent page table depth probing attacks that can be used against KASLR.

Some near-term arm64 hardening patches

Posted Nov 18, 2019 23:58 UTC (Mon) by nivedita76 (subscriber, #121790) [Link] (1 responses)

The documentation and the commit message should probably make that first bit more explicit -- i.e. that /speculative/ accesses are indeed prevented.

Reading the commit message as it stands doesn't give any indication as to why E0PD would prevent Meltdown, as it only mentions constant-time faulting.

Some near-term arm64 hardening patches

Posted Apr 6, 2020 17:50 UTC (Mon) by mwsealey (subscriber, #71282) [Link]

Speculative accesses aren't permitted to cause exceptions, so constant time or not to cause a 'level 0 fault' makes no difference.

Some near-term arm64 hardening patches

Posted Nov 19, 2019 0:04 UTC (Tue) by SethT (guest, #135064) [Link]

Tolvanen's cover letter quote (and this article) doesn't explicitly define PAC which is Pointer Authentication Code which was outlined but not named by abbreviation in the Return-address signing section

Some near-term arm64 hardening patches

Posted Nov 19, 2019 13:28 UTC (Tue) by xnox (guest, #63320) [Link] (4 responses)

About Return address signing, can an attacker not like request to sign all possible pointer addresses and store all of them on disk, then corrupt return address and the load the matching signature from disk?

What is used to sign the pointers?

Some near-term arm64 hardening patches

Posted Nov 19, 2019 14:30 UTC (Tue) by Paf (subscriber, #91811) [Link] (3 responses)

“All possible pointer addresses and store them on disk”
No, since that’s some pretty large fraction of 2^64.

The question of what’s used for signing and where it’s kept is still interesting, though.

Some near-term arm64 hardening patches

Posted Nov 19, 2019 17:53 UTC (Tue) by brouhaha (subscriber, #1698) [Link] (2 responses)

Since the 64 bits of a pointer is being partitioned into some upper bits for the signature, and the remaining lower bits for the virtual address, if you know the virtual address you're interested in you only have to try 2^(signature_width) possibilities. So if the split is e.g. 16 bits for the signature and 48 bits for the virtual address (a common virtual address size on 64-bit processors), that will only require on average 32768 attempts to brute-force the right signature for a given address.

Despite 2^64 being a rather big number (over 18 million TiB), I am still reminded of all the times that people have in the past abused the high bits of addresses thinking that they will never be needed for actual addresses. Two notable examples are IBM mainframes and the Motorola MC68000, both of which originally only used the low 24 bits of addresses, so the high parts were often used for other stuff, which caused huge problems when they expanded the address size (to 31 bits for System/370-XA and MC68012, 32 bits for MC68020).

Some near-term arm64 hardening patches

Posted Nov 20, 2019 1:01 UTC (Wed) by Paf (subscriber, #91811) [Link]

Ah, I misunderstood - I (incorrectly) read this as pointer encryption, but this is just signing with something stuck in the upper bits. (Clearly described in the article, in some lines I clearly skipped.)

Then, yes, you’re absolutely right, it’s not that large a space at all.

Some near-term arm64 hardening patches

Posted Nov 20, 2019 11:31 UTC (Wed) by james (subscriber, #1325) [Link]

I am still reminded of all the times that people have in the past abused the high bits of addresses thinking that they will never be needed for actual addresses. Two notable examples are IBM mainframes and the Motorola MC68000 [...]

You're missing a much more up-to-date example, ARM64 itself, which has had pointer tagging built in from the beginning.

The kernel configures the translation tables so that translations made via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of the virtual address ignored by the translation hardware. This frees up this byte for application use.

-- https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt

(Also, early ARM used the top six bits of the program counter for status flags and the bottom two bits for mode flags, making it faster to save state when handling interrupts.)